-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor Dynamic to Static #7368
Conversation
Since you always run tvm/src/relay/transforms/dynamic_to_static.cc Line 209 in 384714b
|
Can we assume that compile time regression is the worst for BERT? I don't recall infer type or fold constant being slow on other models. |
cc @t-vi who might be interested in incremental type inference |
He, yeah, I had hoped you fixed it. 😉 I think @jroesch had looked into it more than I did at some point (in the context of #6274). My impression is that part of the difficulty is that in-place graph operations are not a good fit with how things work in TVM in general and the frequent copying we do removes the type info. If memory serves me well, this was the main reason for doing the incremental type inference "manually" in the PyTorch frontend. |
efc5d00
to
7a33f9d
Compare
I tried this early on, unfortunately, PrepareArgs ends up making a copy of the IR to do type inference, and then we end up with two different versions of input variables depending on whether or not the op that uses them has a dynamic op before it or not, this breaks several unit tests. To fix it, I would need to do infer_type/constant folding on every op during traversal, but without incremental type inference, that's impossibly slow. This is a middle ground that fixes the problem without too much of a performance hit. |
@t-vi I feel your pain, a few people in OctoML are looking at a possible rewrite of the type inferencer in the coming months to fix some of these issues. |
The worst model I've seen with this pass is ONNX SSD-Mobilenet, which takes about 3 minutes and prompted all of the dynamic rank fixes. |
thanks @mbrookhart |
* DynamicToStatic Refactor * fix test * add regression tests * cleanup * skip PrepareInput if the arg is already a constant * fix an issue with type inference with global functions
* DynamicToStatic Refactor * fix test * add regression tests * cleanup * skip PrepareInput if the arg is already a constant * fix an issue with type inference with global functions
* DynamicToStatic Refactor * fix test * add regression tests * cleanup * skip PrepareInput if the arg is already a constant * fix an issue with type inference with global functions
* DynamicToStatic Refactor * fix test * add regression tests * cleanup * skip PrepareInput if the arg is already a constant * fix an issue with type inference with global functions
I recently spent a lot of time fighting dynamic rank issues in a kind of crazy ONNX model. Fixing it required doing incremental dynamic-to-static before type inference. This PR basically changes the logic of dynamic to static from this:
to this:
This has the advantage that it can analyze those crazy dynamic rank and control flow graphs and simplify them, but it has the disadvantage that it's slower than the previous version because we call infertype and constant folding many more times.
Performance checking shows that this takes a BERT compile from ~15 seconds to ~60 seconds. This should be fixable when incremental type inference becomes available.
Thanks,
Matthew
cc @masahi @jroesch @tmoreau89 @jwfromm @electriclilies